BUPT at TRECVID 2008
نویسندگان
چکیده
High-level feature extraction We describe our system for the task of HLF extraction in TRECVID 2008. Features at different granularities are extracted to describe visual content of keyframes, and four classifiers for each concept are trained by SVM, and different fusion strategies are used to combine final results. The brief introduction to each run is shown in the Table 1.1. Table 1.1. MAP and description of HLF HLF Run infMAP Description BUPT_Sys1 0.047 The classifiers are trained by annotation-III, and maximum fusion scheme is used BUPT_Sys2 0.043 The classifiers are trained by annotation-III, and average fusion scheme is used BUPT_Sys3 0.039 The classifiers are trained by annotation-II, and maximum fusion scheme is used BUPT_Sys4 0.037 The classifiers are trained by annotation-II, and average fusion scheme is used BUPT_Sys5 0.037 The classifiers are trained by annotation-I, and maximum fusion scheme is used BUPT_Sys6 0.034 The classifiers are trained by annotation-I, and average fusion scheme is used The best results of our system are slightly better than the average of participant groups, but it does not work as well as our previous testing. Copy detection Content-based copy detection (CBCD) aims at retrieving a test collection in all the transformations. The visual features we used are robust to intensity offset and color distortion between video clips and test collection to be searched. On the other hand, different search strategies are proposed for long query video clips and short ones. This can reach a tradeoff between accuracy and speed. Evaluation results showed that our CBCD algorithm can achieve a high precision and mean F1, but lots of miss detections happen. This year, Multimedia communication and pattern recognition labs in School of telecommunication engineering, Beijing University of Posts and Telecommunication (BUPT) takes part in three tasks of TRECVID 2008, they are high-level features extraction, copy detection and event detection. 1. High-level feature extraction 1.1 Different annotations * This work was supported by China National Natural Science Foundation under Project 60772114. Three kinds of annotations based on TRECVID-2007 development dataset and test dataset are used this year, and the differences of each annotation are as follows: Annotation-I: It is supplied by MCG-ICT-CAS [1]. Annotation-II: In order to test the difference of different annotations, we re-annotate the Trecvid-2007 development corpus. 20 concepts are divided into two groups according to the concept description of the Trecvid-2008. The first one is region-related concepts, and the rectangles are used to locate the local objects, such as dog, bus, telephone. This group includes 13 concepts: 002 Bridge, 003 Emergency_Vehicle, 004 Dog, 006 Airplane_flying, 007 Two people, 008 Bus, 009 Driver, 010 Cityscape, 012 Telephone, 013 Street, 015 Hand, 018 Boat_Ship, 019 Flower. The second one is global scene-related concepts, and the whole frame is used to describe the concepts, such as classroom, kitchen. This group includes: 001 Classroom, 005 Kitchen, 011 Harbor, 014 Demonstration_Or_Protest, 016 Mountain, 017 Nighttime, 020 Singing. Annotation-III: We re-annotate the Trecvid-2007 development corpus by IBM MPEG-7 Annotation tool [2], and the whole frame is regarded as the concept object. 1.2 Features extraction It is difficult to develop a single approach which is not only invariant to various disturbances but also sensitive enough to capture the details of content. Integrating some complementary features to describe the content is probably a promising way. Therefore, features at different granularities are extracted: for scene-related concepts, global features are selected, and for region-related concepts, some local features are available. At the same time, in order to increase the stability of features, we consider the feature of group of frames. In addition, different color spaces have different characteristics, so we extract features from different color spaces, such as YUV, RGB and HSV. ● SIFT SIFT feature achieved good performance in video analysis [3, 4, 5, 6, 7], so it is taken into account this year. At first, we build a visual vocabulary of SIFT points detected from keyframes based on Difference of Gaussian (DoG) [8], and then more than 50 positive samples from Trecvid-2007 development corpus are chosen for each concept. Secondly, by using of K-means cluster algorithm, about 270,000 SIFT points are clustered into 1000 classes, and each class represents a visual keyword. Thirdly, for every test keyframe, SIFT points are extracted, and then every point is assigned to the nearest class. Finally, the number of occurrences of each visual word is recorded as a histogram. ● Gabor Wavelet Gabor wavelet usually is used at 5 different scales (0,1,...., 4) ν ∈ and 8 orientations (0,1,......7) μ∈ . In order to speed up the runtime, we only extract 3 different scales (0,2,4) ν ∈ and 6 orientations (0,1,...5) μ∈ . For each frame, we divide it into 3*3 blocks, and then the mean and standard deviation of the blocks are calculated to represent the block. Finally, each keyrame is represented by a 324-dimension feature vector. ● Edge Orientation Histogram [14] The edge histogram descriptor represents the spatial distribution of five types of edges (0, 45, 90, 135, non-direction). Since edges play an important role for image perception, it can retrieve images with similar semantic meaning, especially for natural images with non-uniform edge distribution. ● Color Feature Several color detectors which recommended by MPEG-7 are extracted, and some have been proved effective in testing. (1) RGB Color Moment (9 dim) (2) HSV Color Auto-Correlogram (512 dim) (3) HSV Color Histogram (256 dim) (4) HSV Group of Frame (256 dim) (5) RGB Histogram of Block (576 dim) (6) Average Brightness (1 dim) Features are complementary, so before training, a linear feature fusion scheme is first used to normalize and combine them into a feature vector. 1.3 The framework of HLF The framework plays a very important role in the high-level feature extraction. It decides how to choose the classifiers and how many classifiers need to be trained. According to the state of the art, the general classifier [10, 11] could not obtain good performances, but hundreds of classifiers such as [1, 2, 6] would be time-consuming and complicated. In order to balance the performance, complicated and runtime, we propose our HLF framework which is shown in Fig.1.1. Due to the good performance SVM achieved in the past few years [3, 4, 5, 6, 7], we adopt it to train our classifiers. For each concept, four different SVM classifiers based on different feature granularities are trained and total 80 classifiers are obtained. LibSVM tool [9] with RBF kernel is used. During the course of testing, the keyframes of each shot is first extracted, and then several fusion strategies (Vote, maximum probability, average probability) are performed to generate the final results.
منابع مشابه
BUPT - MCPRL at TRECVID 2011 *
In this paper, we describe BUPT-MCPRL systems for TRECVID 2011. Our team participated in five tasks: semantic indexing, known-item search, instance search content-based copy detection and surveillance event detection. A brief introduction is shown as follows: In this year, we proposed two different methods: one based on text and another is bio-inspired method. All 2 runs we submitted are descri...
متن کاملBUPT-MCPRL at TRECVID 2010
In this paper, we describe BUPT-MCPRL systems for TRECVID 2012. Our team participated in three tasks: known-item search, instance search and surveillance event detection. A brief introduction is shown as follows: A. Known-item search This year we submitted 4 automatic runs based on two different approaches, one of which is text-based and the other is visual feature-based. Results of all 4 runs ...
متن کاملBUPT at TRECVID 2007: Shot Boundary Detection
In this paper we describe our methodologies and evaluation results for the shot boundary detection at TRECVID 2007. We submitted 10 runs results based on SVM classifiers and several separate detectors. BUPT_01 Default SVM parameters and a low threshold for motion detector BUPT_02 Default SVM parameters and a low threshold for edge detector BUPT_03 Make high penalty for false cuts to increase th...
متن کاملBUPT - MCPRL at TRECVID 2009
This paper describes BUPT-MCPRL systems for TRECVID 2009. We performed experiments in automatic search, HLF extraction, copy detection and event detection tasks. A. Automatic search A semantic-based video search system was proposed and brief description of submitted 10 runs is shown in Table.1. Table 1 The performance of 10 runs for automatic search Run ID infMAP Description F_A_N_BUPT-MCPR1 0....
متن کاملKnowledge Base Retrieval at TRECVID 2008
This paper describes the Knowledge Base multimedia retrieval system for the TRECVID 2008 evaluation. Our focus this year is on query analysis and the creation of a topic knowledge base using external knowledge base information.
متن کامل